import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
For this excercise, we have written the following code to load the stock dataset built into plotly express.
stocks = px.data.stocks()
stocks.head()
| date | GOOG | AAPL | AMZN | FB | NFLX | MSFT | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | 2018-01-08 | 1.018172 | 1.011943 | 1.061881 | 0.959968 | 1.053526 | 1.015988 |
| 2 | 2018-01-15 | 1.032008 | 1.019771 | 1.053240 | 0.970243 | 1.049860 | 1.020524 |
| 3 | 2018-01-22 | 1.066783 | 0.980057 | 1.140676 | 1.016858 | 1.307681 | 1.066561 |
| 4 | 2018-01-29 | 1.008773 | 0.917143 | 1.163374 | 1.018357 | 1.273537 | 1.040708 |
Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.
fig, ax = plt.subplots(figsize=(12,8))
x = stocks['date']
y = stocks['GOOG']
ax.set(xlabel = 'Date', ylabel = 'Stock Value', title = 'Google stock')
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=2))
ax.plot(x,y, linewidth = 2)
plt.show()
You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.
fig, ax = plt.subplots(figsize=(16, 8))
stocks_list = ['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT']
colors = ['b', 'orange', 'g', 'r', 'purple', 'brown']
linestyles = ['-', '--', '-.', ':', '-', '--']
for i in range(len(stocks_list)):
y = stocks[stocks_list[i]]
ax.plot(x,y, color = colors[i], linestyle = linestyles[i])
plt.legend(stocks_list)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=2))
ax.set(xlabel = 'Date', ylabel = 'Stock Value', title = 'Multiple Stocks')
[Text(0.5, 0, 'Date'), Text(0, 0.5, 'Stock Value'), Text(0.5, 1.0, 'Multiple Stocks')]
First, load the tips dataset
tips = sns.load_dataset('tips')
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.
Some possible questions:
print('The question answered is: Are there differences between male and female when it comes to giving tips? ')
sns.boxplot(x = 'tip', y = 'sex', data = tips)
print('The question answered is: What attribute correlate the most with tip? ')
s = sns.FacetGrid(tips, col = 'day', hue = 'smoker')
s.map(sns.scatterplot, 'total_bill', 'tip')
s.add_legend()
plt.show()
print('only value missing is time of day and size, but can be clearly stated that total bill has the most influence on the tip')
The question answered is: Are there differences between male and female when it comes to giving tips? The question answered is: What attribute correlate the most with tip?
only value missing is time of day and size, but can be clearly stated that total bill has the most influence on the tip
Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.
Hints:
stocks.set_index('date', inplace = True)
# set index of dataframe into date
px.line(stocks)
px.box(tips, x= 'tip', y = 'sex')
df = px.data.tips()
tips_sum = px.histogram(
df, x="total_bill", y="tip", color="smoker",
hover_data=df.columns
)
tips_sum.show()
fig = px.scatter(df, x = 'day', y = 'tip', color = 'size', hover_data = df.columns)
fig.update_xaxes(categoryorder = 'total ascending')
Recreate the barplot below that shows the population of different continents for the year 2007.
Hints:
#load data
df = px.data.gapminder()
df.head()
| country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
df_2007 = df.query('year==2007')
df_2007_new = df_2007.groupby('continent').sum()
continents = ['Africa', 'Americas', 'Asia', 'Europe', 'Oceania']
fig = px.bar(df_2007_new, x="pop", y=df_2007_new.index, orientation='h', color = continents, text = 'pop')
fig.update_yaxes(categoryorder = 'total ascending')
fig.show()